Skip to content

Add ignore param to createDocuments for silent duplicate handling#850

Closed
premtsd-code wants to merge 9 commits intomainfrom
csv-import-upsert
Closed

Add ignore param to createDocuments for silent duplicate handling#850
premtsd-code wants to merge 9 commits intomainfrom
csv-import-upsert

Conversation

@premtsd-code
Copy link
Copy Markdown
Contributor

@premtsd-code premtsd-code commented Apr 7, 2026

Summary

  • Adds bool $ignore = false parameter to createDocuments across all adapters
  • When ignore is true, duplicate documents are silently skipped instead of throwing
  • MariaDB/MySQL: INSERT IGNORE INTO
  • SQLite: INSERT OR IGNORE INTO
  • Postgres: ON CONFLICT DO NOTHING
  • MongoDB: ordered: false with duplicate error suppression
  • Also applies ignore to the permissions INSERT statement
  • Threads ignore param from Database.phpAdapter → SQL/Mongo implementations

Test plan

  • Run existing createDocuments tests to verify no regression
  • Test createDocuments with ignore: true inserting documents with duplicate IDs — should not throw
  • Verify on MariaDB, Postgres, SQLite adapters
  • Verify permissions table also uses ignore when flag is set

Summary by CodeRabbit

  • New Features

    • Bulk document creation gains an optional "ignore" mode that skips duplicate documents instead of failing, available across adapters and mirroring.
  • Performance

    • Upsert operations now pre-batch existence checks, reducing per-item lookups and improving large-batch throughput.
  • Behavior

    • Create/insert and mirror flows propagate the new ignore flag so duplicate-insert attempts are handled gracefully.
  • Tests

    • Added end-to-end tests validating duplicate-ignore scenarios.

When ignore is true, duplicate documents are silently skipped instead
of throwing. Uses INSERT IGNORE INTO for MariaDB/MySQL, INSERT OR
IGNORE INTO for SQLite, ON CONFLICT DO NOTHING for Postgres, and
ordered:false for MongoDB.
@coderabbitai
Copy link
Copy Markdown
Contributor

coderabbitai bot commented Apr 7, 2026

Note

Reviews paused

It looks like this branch is under active development. To avoid overwhelming you with review comments due to an influx of new commits, CodeRabbit has automatically paused this review. You can configure this behavior by changing the reviews.auto_review.auto_pause_after_reviewed_commits setting.

Use the following commands to manage reviews:

  • @coderabbitai resume to resume automatic reviews.
  • @coderabbitai review to trigger a single review.

Use the checkboxes below for quick actions:

  • ▶️ Resume reviews
  • 🔍 Trigger review
📝 Walkthrough

Walkthrough

Adds an optional bool $ignore = false parameter across document-creation flows (Adapter interface, Pool, Mirror, concrete adapters, and Database). When ignore is true, adapters alter insert semantics to suppress duplicate-document errors and Database prefetches existing IDs to skip relationship creation for already-existing documents.

Changes

Cohort / File(s) Summary
Abstract Interface
src/Database/Adapter.php
Abstract createDocuments signature updated to (..., bool $ignore = false) and docblock updated to document ignore behavior.
Delegation / Pool
src/Database/Adapter/Pool.php
createDocuments signature extended to accept $ignore and forwards all args via delegate(__FUNCTION__, \func_get_args()).
Core Database
src/Database/Database.php
createDocuments signature extended with $ignore; when $ignore is true, prefetches existing IDs to skip relationship creation for pre-existing documents and forwards $ignore to adapter calls. upsertDocumentsWithIncrease now batched-prefetches existing docs into an existingDocs lookup instead of per-item getDocument() calls.
Mirror
src/Database/Mirror.php
createDocuments signature extended to accept $ignore and forwards it to source and destination create calls.
Mongo Adapter
src/Database/Adapter/Mongo.php
createDocuments accepts $ignore; sets ordered: false for insertMany when ignoring; processes exceptions and returns inserted documents instead of throwing when processed exception is DuplicateException and $ignore is true.
SQL Adapter
src/Database/Adapter/SQL.php
createDocuments accepts $ignore; inserts and permissions INSERTs composed via new helper extension points (getInsertKeyword, getInsertSuffix, getInsertPermissionsSuffix) to support ignore semantics (e.g., INSERT IGNORE).
Postgres Adapter
src/Database/Adapter/Postgres.php
Added helpers getInsertKeyword, getInsertSuffix, getInsertPermissionsSuffix to centralize ON CONFLICT / insert composition for ignore behavior.
SQLite Adapter
src/Database/Adapter/SQLite.php
Added getInsertKeyword(bool $ignore) returning SQLite-specific insert keyword (INSERT OR IGNORE INTO when $ignore is true).
Tests
tests/e2e/Adapter/Scopes/DocumentTests.php
Added two e2e tests validating createDocuments(..., ignore: true) behavior: skipping duplicates and no-op when all inputs duplicate existing documents.

Sequence Diagram(s)

sequenceDiagram
  rect rgba(200,200,255,0.5)
  participant Client
  end
  rect rgba(200,255,200,0.5)
  participant Database
  end
  rect rgba(255,230,200,0.5)
  participant Pool as AdapterPool
  end
  rect rgba(255,200,200,0.5)
  participant Adapter
  end
  rect rgba(240,240,240,0.5)
  participant DB
  end

  Client->>Database: createDocuments(collection, docs, ignore)
  Database->>Database: prefetch existing IDs (if ignore)
  Database->>AdapterPool: createDocuments(collection, chunk, ignore)
  AdapterPool->>Adapter: delegate('createDocuments', args...)
  Adapter->>DB: INSERT (options/suffix based on ignore)
  alt ignore = false
    DB-->>Adapter: Duplicate key error
    Adapter-->>AdapterPool: throws DuplicateException
    AdapterPool-->>Database: exception
    Database-->>Client: exception
  else ignore = true
    DB-->>Adapter: Partial success / duplicate errors suppressed
    Adapter-->>AdapterPool: returns inserted docs
    AdapterPool-->>Database: returns inserted docs
    Database-->>Client: returns inserted count
  end
Loading

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~20 minutes

Possibly related PRs

  • Feat mongodb #721 — Touches adapter createDocuments and Mongo behavior; directly intersects on signature and duplicate-handling changes.
  • Fix ghost perms #813 — Overlaps with Mongo duplicate-key/error handling changes affecting createDocuments.
  • Feat relationship updates #724 — Related to createDocuments and relationship population logic; intersects on prefetch/batched relationship behavior.

Suggested reviewers

  • fogelito
  • abnegate

Poem

🐇 I hopped through rows and queries, whiskers all a-twitch,
I taught the inserts patience so duplicates can switch.
When echoes come knocking, I kindly say "ignore",
New seeds still get planted, old roots stay ashore.
Thump-thump — I celebrate, then nibble one more stitch.

🚥 Pre-merge checks | ✅ 2 | ❌ 1

❌ Failed checks (1 warning)

Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 75.00% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
✅ Passed checks (2 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The title clearly and concisely describes the main change: adding an ignore parameter to createDocuments for handling duplicate documents silently.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Commit unit tests in branch csv-import-upsert

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

@greptile-apps
Copy link
Copy Markdown

greptile-apps bot commented Apr 7, 2026

Greptile Summary

This PR adds an optional bool $ignore = false parameter to createDocuments that silently skips duplicate documents instead of throwing, implemented with database-native mechanisms (INSERT IGNORE INTO for MariaDB/MySQL, INSERT OR IGNORE INTO for SQLite, ON CONFLICT … DO NOTHING for Postgres, and ordered: false for MongoDB). It also introduces PHP-level intra-batch deduplication, a pre-fetch of already-existing IDs (now correctly chunked to respect maxQueryValues and scoped per-tenant in tenantPerDocument mode), and relationship-creation skipping for pre-existing documents. The Mirror adapter correctly threads the flag to both source and destination.

Key changes:

  • Database.php: intra-batch dedup, chunked pre-fetch of existing IDs, $preExistingIds filter before adapter call, and withTransaction wrapper updated to pass $ignore
  • SQL.php / Postgres.php / SQLite.php: polymorphic getInsertKeyword / getInsertSuffix / getInsertPermissionsSuffix hooks for per-dialect duplicate handling
  • Mongo.php: ordered: false mode outside a session/transaction, with BulkWriteException catch returning []
  • Three new e2e tests covering cross-batch, intra-batch, and all-duplicate scenarios

Resolved from prior review rounds:

  • maxQueryValues overflow when the input ID list exceeds 5 000 (now chunked)
  • $preExistingIds fetch was not tenant-scoped in tenantPerDocument mode (now correctly grouped by tenant)

Remaining concerns (from prior threads, not yet addressed):

  • INSERT IGNORE INTO on MariaDB/MySQL silences all errors, not just duplicate-key violations
  • Postgres ON CONFLICT targets omit COLLATE utf8_ci_ai, which is required for the unique-index inference to succeed — this would cause every Postgres ignore: true call to throw
  • MongoDB BulkWriteException catch returns [], losing successfully-inserted documents from mixed batches and suppressing onNext for those documents

Confidence Score: 3/5

Not safe to merge: the Postgres ON CONFLICT collation mismatch will cause every ignore: true call on Postgres to throw, and the Mongo empty-return on mixed-batch BulkWriteException silently drops successfully-inserted documents.

Two previous P1 bugs have been correctly fixed (maxQueryValues chunking and per-tenant ID scoping). However, three previously-flagged P0/P1 issues remain open: (1) the Postgres ON CONFLICT target omits COLLATE utf8_ci_ai, breaking every Postgres ignore-insert at the DB level; (2) the Mongo adapter returns [] on BulkWriteException, silently losing persisted documents and starving onNext callers; (3) INSERT IGNORE INTO on MariaDB/MySQL suppresses all errors beyond duplicates. These are concrete runtime failures on the primary use-case path, not edge cases.

src/Database/Adapter/Postgres.php (ON CONFLICT collation), src/Database/Adapter/Mongo.php (BulkWriteException returns []), src/Database/Adapter/SQL.php (INSERT IGNORE broadness)

Vulnerabilities

No security concerns identified. The $ignore flag does not bypass authorisation checks — createDocuments still validates the PERMISSION_CREATE grant on the collection before proceeding. The flag only changes the database-level conflict-resolution strategy and does not expose any new surface for data exfiltration or privilege escalation.

Important Files Changed

Filename Overview
src/Database/Database.php Adds $ignore parameter to createDocuments, with PHP-level intra-batch deduplication, pre-fetch of existing IDs (chunked to respect maxQueryValues), and tenant-per-document scoping. Previously flagged chunking and tenant-scoping bugs are resolved in this revision.
src/Database/Adapter/SQL.php Routes $ignore to getInsertKeyword/getInsertSuffix/getInsertPermissionsSuffix helpers. Default implementation uses INSERT IGNORE INTO (MariaDB/MySQL), which silences all errors—not just duplicates—as flagged in previous threads.
src/Database/Adapter/Postgres.php Overrides insert helpers to emit ON CONFLICT … DO NOTHING, but the conflict targets omit collation (COLLATE utf8_ci_ai) required by the unique indexes—likely causing every Postgres ignore-insert to fail, as noted in prior review threads.
src/Database/Adapter/Mongo.php Uses ordered: false outside a transaction for ignore mode. On BulkWriteException returns [], losing all successfully-inserted documents from mixed batches—previously flagged issue remains unaddressed.
src/Database/Adapter/SQLite.php Correctly overrides getInsertKeyword to emit INSERT OR IGNORE INTO. No suffix override needed; inherits empty suffix from SQL base. Change is minimal and correct.
src/Database/Adapter/Pool.php Signature updated to match new $ignore parameter; delegates via __FUNCTION__ / func_get_args() as usual. No logic change needed.
src/Database/Adapter.php Abstract method signature updated with $ignore = false; PHPDoc added. Minimal, correct change.
src/Database/Mirror.php Threads $ignore to both source and destination createDocuments calls. Destination call deliberately omits $onNext/$onError callbacks (already consumed by source), which is intentional and correct.
tests/e2e/Adapter/Scopes/DocumentTests.php Three new e2e tests cover cross-batch duplicate, intra-batch duplicate, and all-duplicate scenarios. Tests are well-structured but rely on pre-filter behaviour; the mixed-batch Mongo race-condition path is not exercised.

Reviews (8): Last reviewed commit: "Deduplicate intra-batch docs by ID in ig..." | Re-trigger Greptile

Comment on lines +2617 to +2619
protected function getInsertKeyword(bool $ignore): string
{
return $ignore ? 'INSERT IGNORE INTO' : 'INSERT INTO';
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P1 INSERT IGNORE INTO silences more than just duplicates

MySQL/MariaDB's INSERT IGNORE INTO suppresses all errors — not only duplicate-key violations. This includes data-truncation warnings (values silently truncated to fit the column), invalid foreign-key references being coerced to NULL, and other constraint violations. The PR intent is "silent duplicate handling", but INSERT IGNORE means a corrupt or structurally invalid row could be silently inserted with wrong data instead of raising an error.

For strict duplicate-only suppression on MariaDB/MySQL, consider catching the specific PDO error code 1062 in the exception handler (similar to how the Mongo adapter does it), or using INSERT INTO … ON DUPLICATE KEY UPDATE <col> = <col> as a no-op.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Documents are validated by StructureValidator before reaching the adapter (Database.php:5731-5741), which checks types, sizes, required fields, and datetime ranges. Appwrite does not use foreign keys. The only error INSERT IGNORE can suppress at this layer is the duplicate key — which is the intended behavior. The alternative (ON DUPLICATE KEY UPDATE col=col) adds trigger side effects and changes rowCount() semantics.

Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)
src/Database/Database.php (1)

5697-5707: ⚠️ Potential issue | 🟠 Major

ignore=true still performs relationship writes for duplicates that are later skipped.

At Line 5697, createDocumentRelationships() runs before insertion outcome is known. With Line 5706 using ignore mode, duplicate docs can be skipped by adapter, but related writes may already be committed. That breaks “silently skipped” semantics and can mutate unrelated data.

⚠️ Suggested fix (pre-filter duplicates before relationship mutation)
+        $preExistingIds = [];
+        $seenInputIds = [];
+        if ($ignore && $this->resolveRelationships) {
+            $inputIds = \array_values(\array_unique(\array_filter(
+                \array_map(fn (Document $doc) => $doc->getId(), $documents)
+            )));
+
+            foreach (\array_chunk($inputIds, $this->maxQueryValues) as $idChunk) {
+                $existing = $this->authorization->skip(fn () => $this->silent(fn () => $this->find(
+                    $collection->getId(),
+                    [
+                        Query::equal('$id', $idChunk),
+                        Query::select(['$id']),
+                        Query::limit(PHP_INT_MAX),
+                    ]
+                )));
+                foreach ($existing as $doc) {
+                    $preExistingIds[$doc->getId()] = true;
+                }
+            }
+        }
+
         foreach ($documents as $document) {
             $createdAt = $document->getCreatedAt();
             $updatedAt = $document->getUpdatedAt();
@@
+            $docId = $document->getId();
+            $isDuplicateInIgnoreMode = $ignore
+                && ($docId !== '')
+                && (isset($preExistingIds[$docId]) || isset($seenInputIds[$docId]));
+            if ($docId !== '') {
+                $seenInputIds[$docId] = true;
+            }
+
-            if ($this->resolveRelationships) {
+            if ($this->resolveRelationships && !$isDuplicateInIgnoreMode) {
                 $document = $this->silent(fn () => $this->createDocumentRelationships($collection, $document));
             }
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/Database/Database.php` around lines 5697 - 5707, The code calls
createDocumentRelationships($collection, $document) (inside the
resolveRelationships branch) before the adapter->createDocuments($collection,
$chunk, $ignore) call, so when $ignore is true duplicate documents that the
adapter later skips can still have relationships created; to fix, defer or
pre-filter relationship writes: determine which documents will actually be
inserted (either by pre-checking duplicates via the adapter or by running
adapter->createDocuments inside the transaction first and then, for the returned
list of successfully created documents, call createDocumentRelationships for
those documents), ensuring relationship mutations occur only for documents
confirmed inserted (update the logic around resolveRelationships,
withTransaction, and adapter->createDocuments to call
createDocumentRelationships after insertion or filter the $chunk before
relationship creation).
🧹 Nitpick comments (1)
src/Database/Adapter/Mongo.php (1)

1495-1512: Verify that returning original documents (not inserted documents) is acceptable for callers.

When a DuplicateException occurs and $ignore is true, the returned $documents is the original input array (from line 1475), not the MongoDB response. This means:

  1. Returned documents won't have MongoDB-generated _id values populated
  2. With ordered: false, MongoDB may have successfully inserted some documents, but the caller cannot determine which ones succeeded vs. were skipped as duplicates

If callers need to know which documents were actually inserted or need the generated _id values, this silent handling won't provide that information.

Consider documenting this behavior in the method's docblock to set caller expectations:

     /**
      * Create Documents in batches
      *
      * `@param` Document $collection
      * `@param` array<Document> $documents
      * `@param` bool $ignore If true, silently ignore duplicate documents instead of throwing
+     *                     Note: When ignore is true and duplicates are encountered, returns
+     *                     the original input documents, not the actual inserted documents.
+     *                     MongoDB-generated _id values will not be populated in returned documents.
      *
      * `@return` array<Document>
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/Database/Adapter/Mongo.php` around lines 1495 - 1512, The current catch
in insertMany (use of insertMany, processException, DuplicateException, $ignore,
and $documents) returns the original input array when a DuplicateException is
caught, so callers will not get Mongo-generated _id values or know which records
succeeded with ordered:false; either change the behavior to return actual
inserted documents by extracting inserted IDs/results from the exception/bulk
write result (or by re-querying inserted documents) before returning, or
explicitly document this behavior in the method docblock: state that when
$ignore is true and a DuplicateException occurs the method returns the original
input records (no _id population and no success/failed distinction), and
recommend callers use a different flow if they require inserted IDs or
per-record success info.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@src/Database/Adapter/SQL.php`:
- Around line 2592-2595: The permissions INSERT for the _perms table is being
executed for all input docs even when $ignore is true, which can add permissions
for documents that were skipped as duplicates; modify the logic around
getInsertKeyword($ignore)/getSQLTable(....'_perms') and the generation of
$permissions so that when $ignore===true you only generate/insert permission
tuples for docs that were actually inserted (or at minimum filter out _uid
values known to pre-exist). Concretely: detect pre-existing _uid entries before
building $permissions (or after a RETURNING of inserted ids), filter the input
set used to build $permissions to exclude those pre-existing IDs when $ignore is
true, and then only run the INSERT into the _perms table (using
getInsertPermissionsSuffix($ignore) as before) with that filtered set to prevent
ACL drift.

---

Outside diff comments:
In `@src/Database/Database.php`:
- Around line 5697-5707: The code calls createDocumentRelationships($collection,
$document) (inside the resolveRelationships branch) before the
adapter->createDocuments($collection, $chunk, $ignore) call, so when $ignore is
true duplicate documents that the adapter later skips can still have
relationships created; to fix, defer or pre-filter relationship writes:
determine which documents will actually be inserted (either by pre-checking
duplicates via the adapter or by running adapter->createDocuments inside the
transaction first and then, for the returned list of successfully created
documents, call createDocumentRelationships for those documents), ensuring
relationship mutations occur only for documents confirmed inserted (update the
logic around resolveRelationships, withTransaction, and adapter->createDocuments
to call createDocumentRelationships after insertion or filter the $chunk before
relationship creation).

---

Nitpick comments:
In `@src/Database/Adapter/Mongo.php`:
- Around line 1495-1512: The current catch in insertMany (use of insertMany,
processException, DuplicateException, $ignore, and $documents) returns the
original input array when a DuplicateException is caught, so callers will not
get Mongo-generated _id values or know which records succeeded with
ordered:false; either change the behavior to return actual inserted documents by
extracting inserted IDs/results from the exception/bulk write result (or by
re-querying inserted documents) before returning, or explicitly document this
behavior in the method docblock: state that when $ignore is true and a
DuplicateException occurs the method returns the original input records (no _id
population and no success/failed distinction), and recommend callers use a
different flow if they require inserted IDs or per-record success info.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

Run ID: ff5730b9-5638-4b21-96b2-8f0420d026c5

📥 Commits

Reviewing files that changed from the base of the PR and between 72ee161 and b62e93e.

📒 Files selected for processing (7)
  • src/Database/Adapter.php
  • src/Database/Adapter/Mongo.php
  • src/Database/Adapter/Pool.php
  • src/Database/Adapter/Postgres.php
  • src/Database/Adapter/SQL.php
  • src/Database/Adapter/SQLite.php
  • src/Database/Database.php

… of per-row getDocument

Replaces N individual getDocument() SELECT queries with a single
find(Query::equal('$id', $ids)) call. For tenant-per-document mode,
groups by tenant and fetches each group separately.
Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)
src/Database/Database.php (1)

5697-5727: ⚠️ Potential issue | 🔴 Critical

ignore=true still writes and reports skipped duplicates as created.

This path resolves relationships before the insert, so a duplicate that gets ignored can already have created/updated related rows. Then the post-insert loop still increments modified and fires $onNext for every returned document, but the SQL and Mongo adapters currently return the original input array when duplicates are ignored. Please pre-filter duplicates before createDocumentRelationships() and only count/callback the actually inserted subset.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/Database/Database.php` around lines 5697 - 5727, The code resolves
relationships for every input document before insert and then uses the original
input array when $ignore=true, which causes skipped duplicates to still have
relationships created and still be counted/callbacked; change the flow so that
when $ignore is true you first determine/filter out duplicates (e.g., via a new
or existing adapter method like exists/findExisting/batchExists) and only call
createDocumentRelationships() for the non-duplicate subset, then call
createDocuments() and when iterating the returned $batch use the adapter’s
returned created documents (or compare returned IDs to inputs) to drive the
onNext callbacks and increment $modified so only actually inserted documents are
counted; update any places referencing createDocumentRelationships,
populateDocumentsRelationships, createDocuments, $ignore,
$inBatchRelationshipPopulation, $resolveRelationships, $onNext and $modified
accordingly.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@src/Database/Database.php`:
- Around line 7122-7159: The unconditional pre-read now populates $old and
breaks the auth-disabled path; revert to honoring the original skipPreRead logic
used in Database::upsertDocumentsWithIncrease(): guard the batch-fetch of
existing documents (the block that builds $ids, $existingDocs and populates
$old) with the same condition used previously (i.e. skip the pre-read when
$skipPreRead is true or when !$this->authorization->getStatus() && $attribute
!== ''), so that when authorization is disabled incoming explicit $permissions
are ignored and existing permissions are preserved (creates default to []).
- Around line 7122-7150: The tenant-per-document branch currently calls find()
with Query::equal('$id', $tenantIds) for the whole tenant group which can exceed
validation limits; change the loop that builds $idsByTenant inside the Database
class so that before calling $this->authorization->skip(...
$this->withTenant(... $this->silent(... $this->find(... Query::equal('$id',
...)))) you chunk $tenantIds into batches of at most $this->maxQueryValues (or
the existing maxQueryValues property/method used elsewhere), and invoke the
find() call once per chunk, merging results into $existingDocs keyed by getId();
keep the same call chain (authorization->skip, withTenant, silent, find) and the
Query::limit(count(chunk)) for each chunk.

---

Outside diff comments:
In `@src/Database/Database.php`:
- Around line 5697-5727: The code resolves relationships for every input
document before insert and then uses the original input array when $ignore=true,
which causes skipped duplicates to still have relationships created and still be
counted/callbacked; change the flow so that when $ignore is true you first
determine/filter out duplicates (e.g., via a new or existing adapter method like
exists/findExisting/batchExists) and only call createDocumentRelationships() for
the non-duplicate subset, then call createDocuments() and when iterating the
returned $batch use the adapter’s returned created documents (or compare
returned IDs to inputs) to drive the onNext callbacks and increment $modified so
only actually inserted documents are counted; update any places referencing
createDocumentRelationships, populateDocumentsRelationships, createDocuments,
$ignore, $inBatchRelationshipPopulation, $resolveRelationships, $onNext and
$modified accordingly.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

Run ID: 4a7ebb0a-06d4-4f9d-aa6e-bcb8ce39a615

📥 Commits

Reviewing files that changed from the base of the PR and between b62e93e and 36b048f.

📒 Files selected for processing (1)
  • src/Database/Database.php

…eries by maxQueryValues

When ignore=true, pre-fetch existing document IDs so
createDocumentRelationships() is skipped for duplicates that
will be silently ignored by the adapter.

Also chunk the batch find() in upsertDocumentsWithIncrease
by maxQueryValues to avoid exceeding query parameter limits.
- createDocuments ignore pre-fetch: group by tenant with composite
  key (tenant:id) to avoid cross-tenant false positives
- upsertDocumentsWithIncrease batch-fetch: use composite key
  (tenant:id) so documents from different tenants with the same ID
  don't overwrite each other in the lookup map
- Add tests for createDocuments with ignore=true: duplicates are
  silently skipped, originals unchanged, new docs inserted
Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@tests/e2e/Adapter/Scopes/DocumentTests.php`:
- Around line 7772-7834: The tests call createDocuments(...) with ignore: true
but never assert the returned/emitted batch semantics; update both tests (the
calls to createDocuments in
DocumentTests::testCreateDocumentsIgnorePartialDuplicates and
::testCreateDocumentsIgnoreAllDuplicates) to capture the returned $count and to
pass a callback that collects emitted/returned Document objects, then assert the
callback saw only the non-duplicate documents (e.g., doc3 in the partial case
and none in the all-duplicates case) and that $count equals the number of
actually inserted documents; this ensures createDocuments (and the duplicate
catch path in src/Database/Adapter/Mongo.php) correctly skips duplicates both in
persistence and in the emitted/returned stream.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

Run ID: d9f1d094-f9a6-4c75-83c0-f82c2bfa2f15

📥 Commits

Reviewing files that changed from the base of the PR and between 59aab83 and e85729d.

📒 Files selected for processing (2)
  • src/Database/Database.php
  • tests/e2e/Adapter/Scopes/DocumentTests.php
🚧 Files skipped from review as they are similar to previous changes (1)
  • src/Database/Database.php

… ACL drift

- Pre-filter known-duplicate docs before passing to adapter, preventing
  permissions being inserted for existing docs (ACL drift) and fixing
  MongoDB transaction abort on BulkWriteException
- Skip transaction session in Mongo adapter when ignore=true so
  ordered:false inserts persist even if some fail
- Return empty array from Mongo catch block instead of untransformed
  cloned inputs
- Quote all column names in Postgres ON CONFLICT targets consistently
- Wrap array_chunk calls with max(1, ...) to satisfy PHPStan int<1,max>
Comment on lines +1373 to +1395
protected function getInsertSuffix(bool $ignore, string $table): string
{
if (!$ignore) {
return '';
}

$conflictTarget = $this->sharedTables ? '("_uid", "_tenant")' : '("_uid")';

return "ON CONFLICT {$conflictTarget} DO NOTHING";
}

protected function getInsertPermissionsSuffix(bool $ignore): string
{
if (!$ignore) {
return '';
}

$conflictTarget = $this->sharedTables
? '("_type", "_permission", "_document", "_tenant")'
: '("_type", "_permission", "_document")';

return "ON CONFLICT {$conflictTarget} DO NOTHING";
}
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P1 ON CONFLICT targets omit COLLATE utf8_ci_ai, breaking index inference on Postgres

The unique indexes created in createCollection use COLLATE utf8_ci_ai on "_uid" (documents table, both shared and non-shared modes) and on _document (non-shared permissions table). PostgreSQL's ON CONFLICT (col_list) inference requires an exact column match — including collation — to identify which unique constraint to use. The generated conflict targets omit the collation, so Postgres will throw:

ERROR: there is no unique or exclusion constraint matching the ON CONFLICT specification

This means every createDocuments(..., ignore: true) call will fail on Postgres.

getInsertSuffix (lines 1373–1382): The unique index on the documents table is "_uid" COLLATE utf8_ci_ai [, "_tenant"], but the conflict target is ("_uid", "_tenant") / ("_uid") with no collation.

getInsertPermissionsSuffix (lines 1384–1395, non-shared branch): The unique index on _perms is (_document COLLATE utf8_ci_ai, _type, _permission), but the conflict target is ("_type", "_permission", "_document") with no collation.

The collation must be specified in the inference list to match the actual index. For example:

// getInsertSuffix — non-shared
$conflictTarget = '("_uid" COLLATE utf8_ci_ai)';

// getInsertSuffix — shared
$conflictTarget = '("_uid" COLLATE utf8_ci_ai, "_tenant")';

// getInsertPermissionsSuffix — non-shared
$conflictTarget = '("_document" COLLATE utf8_ci_ai, "_type", "_permission")';

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not an issue. The existing upsertDocuments method (Postgres.php:1462) uses ("_uid") without COLLATE and works in production. Postgres infers the correct unique index when there is only one matching index on the column set — which is the case here. All Postgres ignore tests pass (verified across all 13 adapter configurations).

Comment on lines 1499 to +1507
} catch (MongoException $e) {
throw $this->processException($e);
$processed = $this->processException($e);

if ($ignore && $processed instanceof DuplicateException) {
// Race condition: a doc was inserted between pre-filter and insertMany.
// With ordered:false outside transaction, non-duplicate inserts persist.
// Return empty — we cannot determine which docs succeeded without querying.
return [];
}
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P1 onNext never fires for successfully-inserted documents in a mixed-duplicate batch

With ordered: false, MongoDB processes every document in the batch and only raises BulkWriteException after completing all writes — meaning some documents may have been genuinely persisted before the exception. When the catch block returns [], Database.php iterates that empty array and calls onNext zero times, silently swallowing results for every document actually written in that batch.

Callers that use onNext to stream processed results (cache warming, post-insert hooks, progress tracking) will miss every successfully inserted document from any batch that contained at least one duplicate — which is the common case for ignore: true bulk-loads.

A more complete fix would recover succeeded writes from the exception itself via $e->getWriteResult()->getInsertedIds() to re-fetch and return only the actually-inserted documents, rather than returning [] unconditionally.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Known limitation. This only triggers on a race condition — pre-fetch already filters known duplicates before the adapter. The utopia-php/mongo client does not expose BulkWriteException::getWriteResult(), so we cannot recover which docs succeeded in a mixed batch. For the migration use case (single writer), this race cannot occur. Documented as accepted behavior.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant